The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
High-fidelity facial avatar reconstruction from a monocular video is a significant research problem in computer graphics and computer vision. Recently, Neural Radiance Field (NeRF) has shown impressive novel view rendering results and has been considered for facial avatar reconstruction. However, the complex facial dynamics and missing 3D information in monocular videos raise significant challenges for faithful facial reconstruction. In this work, we propose a new method for NeRF-based facial avatar reconstruction that utilizes 3D-aware generative prior. Different from existing works that depend on a conditional deformation field for dynamic modeling, we propose to learn a personalized generative prior, which is formulated as a local and low dimensional subspace in the latent space of 3D-GAN. We propose an efficient method to construct the personalized generative prior based on a small set of facial images of a given individual. After learning, it allows for photo-realistic rendering with novel views and the face reenactment can be realized by performing navigation in the latent space. Our proposed method is applicable for different driven signals, including RGB images, 3DMM coefficients, and audios. Compared with existing works, we obtain superior novel view synthesis results and faithfully face reenactment performance.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth estimation solutions that can show a real-time performance on IoT platforms and smartphones. For this, the participants used a large-scale RGB-to-depth dataset that was collected with the ZED stereo camera capable to generated depth maps for objects located at up to 50 meters. The runtime of all models was evaluated on the Raspberry Pi 4 platform, where the developed solutions were able to generate VGA resolution depth maps at up to 27 FPS while achieving high fidelity results. All models developed in the challenge are also compatible with any Android or Linux-based mobile devices, their detailed description is provided in this paper.
translated by 谷歌翻译
我们解决了从一般标记(例如电影海报)估计对应关系到捕获这种标记的图像的问题。通常,通过拟合基于稀疏特征匹配的同型模型来解决此问题。但是,他们只能处理类似平面的标记,而稀疏功能不能充分利用外观信息。在本文中,我们提出了一个新颖的框架神经标记器,训练神经网络估计在各种具有挑战性的条件下(例如标记变形,严格的照明等)估算密集标记的对应关系。此外,我们还提出了一种新颖的标记通信评估方法,对真实标记的注释进行了注释。 - 图像对并创建一个新的基准测试。我们表明,神经标记的表现明显优于以前的方法,并实现了新的有趣应用程序,包括增强现实(AR)和视频编辑。
translated by 谷歌翻译
近年来,基于注意力的场景文本识别方法非常受欢迎,并吸引了许多研究人员的兴趣。基于注意力的方法可以将注意力集中在解码过程中的小区域甚至单点上,其中注意矩阵几乎是一个旋转分布。此外,在推断过程中,所有注意力矩阵都将加权整个特征地图,从而导致巨大的冗余计算。在本文中,我们提出了一个用于场景文本识别的有效无注意的单点解码网络(称为SPDN),该网络可以取代传统的基于注意力的解码网络。具体而言,我们建议单点采样模块(SPSM)有效地在特征映射上为解码一个字符的一个关键点采样。这样,我们的方法不仅可以精确地找到每个字符的关键点,还可以删除冗余计算。基于SPSM,我们设计了一个高效且新颖的单点解码网络,以替代基于注意力的解码网络。对公开基准测试的广泛实验证明,我们的SPDN可以大大提高解码效率而不牺牲性能。
translated by 谷歌翻译
知识蒸馏(KD)已广泛发展并增强了各种任务。经典的KD方法将KD损失添加到原始的跨熵(CE)损失中。我们尝试分解KD损失,以探索其与CE损失的关系。令人惊讶的是,我们发现它可以被视为CE损失和额外损失的组合,其形式与CE损失相同。但是,我们注意到额外的损失迫使学生学习教师绝对概率的相对可能性。此外,这两个概率的总和是不同的,因此很难优化。为了解决这个问题,我们修改了配方并提出分布式损失。此外,我们将教师的目标输出作为软目标,提出软损失。结合软损失和分布式损失,我们提出了新的KD损失(NKD)。此外,我们将学生的目标输出稳定,将其视为无需教师的培训的软目标,并提出了无教师的新KD损失(TF-NKD)。我们的方法在CIFAR-100和Imagenet上实现了最先进的性能。例如,以Resnet-34为老师,我们将Imagenet TOP-1的RESNET18的TOP-1精度从69.90%提高到71.96%。在没有教师的培训中,Mobilenet,Resnet-18和Swintransformer-tiny的培训占70.04%,70.76%和81.48%,分别比基线高0.83%,0.86%和0.30%。该代码可在https://github.com/yzd-v/cls_kd上找到。
translated by 谷歌翻译
在各种基于学习的图像恢复任务(例如图像降解和图像超分辨率)中,降解表示形式被广泛用于建模降解过程并处理复杂的降解模式。但是,在基于学习的图像deblurring中,它们的探索程度较低,因为在现实世界中挑战性的情况下,模糊内核估计不能很好地表现。我们认为,对于图像降低的降解表示形式是特别必要的,因为模糊模式通常显示出比噪声模式或高频纹理更大的变化。在本文中,我们提出了一个框架来学习模糊图像的空间自适应降解表示。提出了一种新颖的联合图像re毁和脱蓝色的学习过程,以提高降解表示的表现力。为了使学习的降解表示有效地启动和降解,我们提出了一个多尺度退化注入网络(MSDI-NET),以将它们集成到神经网络中。通过集成,MSDI-NET可以适应各种复杂的模糊模式。 GoPro和Realblur数据集上的实验表明,我们提出的具有学识渊博的退化表示形式的Deblurring框架优于最先进的方法,具有吸引人的改进。该代码在https://github.com/dasongli1/learning_degradation上发布。
translated by 谷歌翻译
由于医学图像的数据稀缺性和数据异质性是普遍存在的,因此在部署到新站点时,使用先前的归一化方法训练有素的卷积神经网络(CNN)可能会表现不佳。但是,现实世界应用程序的可靠模型应该能够在分布(IND)和分布(OOD)数据(例如新站点数据)上很好地概括。在这项研究中,我们提出了一种称为窗口归一化(WIN)的新型归一化技术,这是现有标准化方法的简单而有效的替代方法。具体而言,赢得了与特征窗口上计算的本地统计数据的归一化统计数据。此功能级增强技术可以很好地规范模型,并显着改善了其OOD的概括。利用它的优势,我们提出了一种称为Win Win的新型自我鉴定方法,以进一步改善分类中的OOD概括。通过两次向前传球和一致性约束可以轻松实现双赢,这对于现有方法来说是一个简单的扩展。关于各种任务(例如青光眼检测,乳腺癌检测,染色体分类,视盘和杯赛分割等)和数据集(26个数据集)的广泛实验结果证明了我们方法的一般性和有效性。该代码可从https://github.com/joe1chief/windownormalizaion获得。
translated by 谷歌翻译
最近,机器学习(ML)电位的发展使得以量子力学(QM)模型的精度进行大规模和长期分子模拟成为可能。但是,对于高水平的QM方法,例如在元gga级和/或具有精确交换的密度函数理论(DFT),量子蒙特卡洛等,生成足够数量的用于训练的数据由于其高成本,计算挑战性。在这项工作中,我们证明了基于ML的DFT模型Deep Kohn-Sham(Deepks)可以在很大程度上缓解这个问题。 DeepKS采用计算高效的基于神经网络的功能模型来构建在廉价DFT模型上添加的校正项。在训练后,DeepKs提供了与高级QM方法相比,具有紧密匹配的能量和力,但是所需的训练数据的数量是比训练可靠的ML潜力所需的数量级要小。因此,DeepKs可以用作昂贵的QM型号和ML电位之间的桥梁:一个人可以生成相当数量的高准确性QM数据来训练DeepKs模型,然后使用DeepKs型号来标记大量的配置以标记训练ML潜力。该周期系统方案在DFT软件包算盘中实施,该计划是开源的,可以在各种应用程序中使用。
translated by 谷歌翻译